NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the causes, consequences, and avoidance of PCR duplicates: Towards a theory of library complexity

https://doi.org/10.1111/1755-0998.13800

Rochette, Nicolas C.; Rivera‐Colón, Angel G.; Walsh, Jessica; Sanger, Thomas J.; Campbell‐Staton, Shane C.; Catchen, Julian M. (August 2023, Molecular Ecology Resources)

Abstract Library preparation protocols for most sequencing technologies involve PCR amplification of the template DNA, which open the possibility that a given template DNA molecule is sequenced multiple times. Reads arising from this phenomenon, known as PCR duplicates, inflate the cost of sequencing and can jeopardize the reliability of affected experiments. Despite the pervasiveness of this artefact, our understanding of its causes and of its impact on downstream statistical analyses remains essentially empirical. Here, we develop a general quantitative model of amplification distortions in sequencing data sets, which we leverage to investigate the factors controlling the occurrence of PCR duplicates. We show that the PCR duplicate rate is determined primarily by the ratio between library complexity and sequencing depth, and that amplification noise (including in its dependence on the number of PCR cycles) only plays a secondary role for this artefact. We confirm our predictions using new and published RAD‐seq libraries and provide a method to estimate library complexity and amplification noise in any data set containing PCR duplicates. We discuss how amplification‐related artefacts impact downstream analyses, and in particular genotyping accuracy. The proposed framework unites the numerous observations made on PCR duplicates and will be useful to experimenters of all sequencing technologies where DNA availability is a concern.
more » « less
Full Text Available
Chromosome-Level Genome Assembly and Circadian Gene Repertoire of the Patagonia Blennie Eleginops maclovinus—The Closest Ancestral Proxy of Antarctic Cryonotothenioids

https://doi.org/10.3390/genes14061196

Cheng, Chi-Hing Christina; Rivera-Colón, Angel G.; Minhas, Bushra Fazal; Wilson, Loralee; Rayamajhi, Niraj; Vargas-Chacoff, Luis; Catchen, Julian M. (June 2023, Genes)

The basal South American notothenioid Eleginops maclovinus (Patagonia blennie or róbalo) occupies a uniquely important phylogenetic position in Notothenioidei as the singular closest sister species to the Antarctic cryonotothenioid fishes. Its genome and the traits encoded therein would be the nearest representatives of the temperate ancestor from which the Antarctic clade arose, providing an ancestral reference for deducing polar derived changes. In this study, we generated a gene- and chromosome-complete assembly of the E. maclovinus genome using long read sequencing and HiC scaffolding. We compared its genome architecture with the more basally divergent Cottoperca gobio and the derived genomes of nine cryonotothenioids representing all five Antarctic families. We also reconstructed a notothenioid phylogeny using 2918 proteins of single-copy orthologous genes from these genomes that reaffirmed E. maclovinus’ phylogenetic position. We additionally curated E. maclovinus’ repertoire of circadian rhythm genes, ascertained their functionality by transcriptome sequencing, and compared its pattern of gene retention with C. gobio and the derived cryonotothenioids. Through reconstructing circadian gene trees, we also assessed the potential role of the retained genes in cryonotothenioids by referencing to the functions of the human orthologs. Our results found E. maclovinus to share greater conservation with the Antarctic clade, solidifying its evolutionary status as the direct sister and best suited ancestral proxy of cryonotothenioids. The high-quality genome of E. maclovinus will facilitate inquiries into cold derived traits in temperate to polar evolution, and conversely on the paths of readaptation to non-freezing habitats in various secondarily temperate cryonotothenioids through comparative genomic analyses.
more » « less
Full Text Available
Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki

https://doi.org/10.1093/g3journal/jkac192

Rayamajhi, Niraj; Cheng, Chi-Hing Christina; Catchen, Julian M (July 2022, G3 Genes|Genomes|Genetics)

Abstract For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least three phases: i) short-read only, ii) short- and long-read hybrid, and iii) long-read only assemblies. Each of the phases has their own error model. We hypothesized that hidden scaffolding errors in short-read assembly and erroneous long-read contigs degrades the quality of short- and long-read hybrid assemblies. We assembled the genome of T. borchgrevinki from data generated during each of the three phases and assessed the quality problems we encountered. We developed strategies such as k-mer-assembled region replacement, parameter optimization, and long-read sampling to address the error models. We demonstrated that a k-mer based strategy improved short-read assemblies as measured by BUSCO while mate-pair libraries introduced hidden scaffolding errors and perturbed BUSCO scores. Further, we found that although hybrid assemblies can generate higher contiguity they tend to suffer from lower quality. In addition, we found long-read only assemblies can be optimized for contiguity by sub-sampling length-restricted raw reads. Our results indicate that long-read contig assembly is the current best choice and that assemblies from phase I and phase II were of lower quality.
more » « less
Full Text Available
Genomics of Secondarily Temperate Adaptation in the Only Non-Antarctic Icefish

https://doi.org/10.1093/molbev/msad029

Rivera-Colón, Angel G; Rayamajhi, Niraj; Minhas, Bushra Fazal; Madrigal, Giovanni; Bilyk, Kevin T; Yoon, Veronica; Hüne, Mathias; Gregory, Susan; Cheng, C H; Catchen, Julian M (March 2023, Molecular Biology and Evolution)
Kelley, Joanna (Ed.)
Abstract White-blooded Antarctic icefishes, a family within the adaptive radiation of Antarctic notothenioid fishes, are an example of extreme biological specialization to both the chronic cold of the Southern Ocean and life without hemoglobin. As a result, icefishes display derived physiology that limits them to the cold and highly oxygenated Antarctic waters. Against these constraints, remarkably one species, the pike icefish Champsocephalus esox, successfully colonized temperate South American waters. To study the genetic mechanisms underlying secondarily temperate adaptation in icefishes, we generated chromosome-level genome assemblies of both C. esox and its Antarctic sister species, Champsocephalus gunnari. The C. esox genome is similar in structure and organization to that of its Antarctic congener; however, we observe evidence of chromosomal rearrangements coinciding with regions of elevated genetic divergence in pike icefish populations. We also find several key biological pathways under selection, including genes related to mitochondria and vision, highlighting candidates behind temperate adaptation in C. esox. Substantial antifreeze glycoprotein (AFGP) pseudogenization has occurred in the pike icefish, likely due to relaxed selection following ancestral escape from Antarctica. The canonical AFGP locus organization is conserved in C. esox and C. gunnari, but both show a translocation of two AFGP copies to a separate locus, previously unobserved in cryonotothenioids. Altogether, the study of this secondarily temperate species provides an insight into the mechanisms underlying adaptation to ecologically disparate environments in this otherwise highly specialized group.
more » « less
Full Text Available
Simulation with RADinitio improves RADseq experimental design and sheds light on sources of missing data

https://doi.org/10.1111/1755-0998.13163

Rivera‐Colón, Angel G.; Rochette, Nicolas C.; Catchen, Julian M. (May 2020, Molecular Ecology Resources)

Abstract Restriction‐site associated DNA sequencing (RADseq) has become a powerful and versatile tool in modern population genomics, enabling large‐scale evolutionary and genomic analyses in otherwise inaccessible biological systems. With its widespread use, different variants on the protocol have been developed to suit specific experimental needs. Researchers face the challenge of choosing the optimal molecular and sequencing protocols for their reduced representation experimental design, an often‐complicated process. Strategic errors can lead to biased data generation that has reduced power to answer biological questions. Here, we present RADinitio, simulation software for the selection and optimization of RADseq experiments via the generation of sequencing data that behave similarly to empirical sources. RADinitio provides an evolutionary simulation of populations, implementation of various RADseq protocols with customizable parameters, and thorough assessment of missing data. We test the efficacy of the software using different RAD protocols across several organisms, highlighting the importance of protocol selection on the magnitude and quality of data acquired. Additionally, we test the effects of RAD library preparation and sequencing on allelic dropout, observing that library preparation and sequencing often contributes more to missing alleles than population‐level variation.
more » « less
Stacks 2: Analytical methods for paired‐end sequencing improve RADseq‐based population genomics

https://doi.org/10.1111/mec.15253

Rochette, Nicolas C.; Rivera‐Colón, Angel G.; Catchen, Julian M. (October 2019, Molecular Ecology)

Abstract For half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively‐parallel, short‐read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here, we describe the first software natively capable of using paired‐end sequencing to derive short contigs from de novo RAD data. Stacks version 2 employs a de Bruijn graph assembler to build and connect contigs from forward and reverse reads for each de novo RAD locus, which it then uses as a reference for read alignments. The new architecture allows all the individuals in a metapopulation to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes, generating RAD loci that are 400–800 bp in length. To prove its recall and precision, we tested the software with simulated data and compared reference‐aligned and de novo analyses of three empirical data sets. Our study shows that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired‐end de novo data sets.
more » « less
Male and female contributions to behavioral isolation in darters as a function of genetic distance and color distance: BEHAVIORAL ISOLATION IN DARTERS

https://doi.org/10.1111/evo.13321

Moran, Rachel L.; Zhou, Muchu; Catchen, Julian M.; Fuller, Rebecca C. (October 2017, Evolution)
Hybridization and postzygotic isolation promote reinforcement of male mating preferences in a diverse group of fishes with traditional sex roles

https://doi.org/10.1002/ece3.4434

Moran, Rachel L.; Zhou, Muchu; Catchen, Julian M.; Fuller, Rebecca C. (September 2018, Ecology and Evolution)

Search for: All records